比利时专利BE1027283B1 METHOD, SYSTEM AND DEVICE FOR DETECTING PRODUCT VIEWS

专利PDF首页>>比利时专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
A method of detecting product views comprising: obtaining, at an image processing control, depth measurements, the depth measurements representing a support structure that supports a plurality of product views, image data, the image data representing the support structure, and a set of region of interest. ROI) indicators, where each ROI indicator indicates a position of a plurality of the product views; generating a first set of candidate viewing edges from the depth measurements; generating a second set of candidate view edges from the image data; generating a third set of candidate face edges by combining the first and second sets; generating a candidate view boundary for each adjacent pair of candidate face edges in the third set of candidate face edges; selecting a subset of output view boundaries from the candidate view boundaries based on the ROI indicators; and storing the output view boundaries in a memory coupled to the image processing control.
公开号:BE1027283B1
申请号:E20205281
申请日:2020-04-29
公开日:2021-02-25
发明作者:Joseph Lam；Yuanhao Yu；Abhishek Rawat
申请人:Zebra Tech；
IPC主号:

专利说明:

METHOD, SYSTEM AND DEVICE FOR DETECTION OF PRODUCT VIEWS
BACKGROUND Environments in which objects are managed, such as retail facilities, storage and distribution facilities, and the like, may store those objects in regions such as rack hallways or the like. For example, a retail facility may contain items such as products for sale, and a distribution facility may contain items such as packages or pallets. For example, a mobile automation device can be deployed in such a facility to perform tasks at various locations. A mobile automation device can, for example, be used to record data, the data representing a corridor with corresponding products in a shop facility. However, the variability of the products in the facility, as well as the variations in data recording conditions (e.g., lighting, etc.), may prevent accurate detection of the individual products and their status from such data.
SUMMARY OF THE INVENTION According to one aspect of the invention, there is provided a method by an image processing control for detecting product views from recorded depth and image data, the method comprising obtaining, at the image processing control, (1) depth measurements from at least one depth sensor wherein the depth measurements represent a support structure that supports a plurality of product views, (1) image data from at least one image sensor, the image data representing the support structure, and (iii) a set of region of interest (ROT) indicators, each ROI indicator indicates a position of a plurality of the product views, generating, by a depth detector of the image processing control, a first set of candidate view edges from the depth measurements, generating, by an image detector of the image processing control, a second set of candidate visible edges va n from the image data, a boundary generator of the image processing control generates (1) a third set of candidate view edges by combining the first and second sets, and (1) a candidate view boundary for each adjacent pair of candidate view edges in the third set candidate view edges, selecting, by the image processing control boundary generator, a subset of output view boundaries from the candidate view boundaries based on the ROI indicators, and detecting, by the image processing control boundary generator, product views including the selected subset of output view boundaries.
Optionally or additionally, generating the first set of candidate viewing edges may include generating a two-dimensional depth map from the depth measurements and detecting edges in the depth map.
Optionally or additionally, detecting edges in the depth map may include applying an edge detect operation to the depth map to generate an edge weighted depth map, and applying a line detect operation to the edge weighted depth map.
Optionally or additionally, the method may further comprise removing candidate edges from the first set that are not within a ROI indicator.
Optionally or additionally, generating the second set of candidate viewing edges may include selecting a plurality of windows from the image data, and classifying each window as one containing an edge and not containing an edge.
Optionally or additionally, the method may further comprise determining a position of each candidate viewing edge of the second set based on an intensity profile of a corresponding one of the windows.
Optionally or additionally, combining the first and second sets of candidate face edges can determine, for each pair of the adjacent candidate face edges, whether a distance separating the pair is below a threshold, replacing the pair with a single candidate view edge when the distance separating the pair is below the limit value.
Optionally or additionally, generating the candidate view borders may include bringing together upper ends of the adjacent pair of the third set of candidate viewing edges with an upper border segment, and bringing lower ends of the adjacent pair of the third set of candidate viewing edges together with a lower border segment. .
Optionally or additionally, selecting the subset of output view boundaries may include removing candidate view boundaries that are outside the ROI indicators.
According to one aspect of the invention, a computer device is provided. The computer device includes a memory adapted to store (1) depth measurements from at least one depth sensor, the depth measurements representing a support structure that supports a plurality of product views, (1) image data from at least one image sensor, the image data representing the support structure, and (iii) a set of region of interest (ROD) indicators, each ROI indicator indicating a position of a plurality of the product views. The computer apparatus also includes an image processing controller including a depth detector configured to obtain the depth measurements and the ROI indicators, and generating, from the depth measurements, a first set of candidate viewing edges, an image detector configured to obtain image data and the ROI indicators, and generating, from the image data, a second set of candidate view edges, and a boundary generator configured to generate a third set of candidate view edges by combining the first and second sets, generating a candidate view boundary for each pair of adjacent candidate view edges in the third set of candidate view edges, selecting a subset of output view boundaries from the candidate view boundaries based on the ROI indicators, and detecting product views including the selected subset of output view boundaries.
Optionally or additionally, the depth detector may be configured to generate the first set of candidate viewing edges, generate a two-dimensional depth map from the depth measurements and detect edges in the depth map.
Optionally or additionally, the depth detector may further be configured to remove candidate edges from the first set that are not within a ROI indicator.
Optionally or additionally, the image detector may be configured to, to generate the second set of candidate viewing edges, select a plurality of windows from the image data, and classify each window as one containing an edge and not containing an edge.
Optionally or additionally, the image detector may be further configured to determine a position of each candidate viewing edge of the second set based on an intensity profile of a corresponding one of the windows.
Optionally or additionally, the boundary generator may be arranged to, for combining the first and second sets of candidate face edges, determine, for each pair of adjacent candidate face edges, whether a distance separating the pair is below a boundary value. of the pair with a single candidate viewing edge when the distance separating the pair is below the limit value.
Optionally or additionally, the boundary generator may be arranged to, to generate the candidate view boundaries, bring together upper ends of the adjacent pair of the third set of candidate view edges with an upper boundary segment, and bring together lower ends of the adjacent pair of the third set. of candidate elevation edges with a lower boundary segment.
Optionally or additionally, the boundary generator may be arranged to remove candidate view boundaries outside the ROI indicators to select the subset of resulting view boundaries.
In accordance with one aspect of the invention, a non-temporary computer-readable medium containing instructions executable by an image processing control is provided for arranging an image processing control to obtain () depth measurements from at least one depth sensor, the depth measurements representing a support structure representing a plurality of of product views, (1) image data from at least one image sensor, the image data representing the support structure, and (iii) a set of region of interest (ROT) indicators, each ROI indicator indicating a position of a plurality of the product views, generating a first set of candidate face edges from the depth measurements, generating a second set of candidate face edges from the image data, generating a third set of candidate face edges by combining the first and second sets, the generation of a candidate view boundary for each pair of adjacent candidate view edges in the third set of candidate view edges, selecting a subset of output view boundaries from the candidate view boundaries based on the ROI indicators, and detecting product views including the selected subset of output view boundaries. BRIEF DESCRIPTION OF THE DIFFERENT
VIEWS OF THE DRAWINGS The accompanying figures, where like reference numerals refer to identical or functionally similar elements throughout the individual views, together with the detailed description below, are incorporated into and form a part of the specification, and serve to illustrate embodiments of concepts that support the claimed invention further to illustrate and explain various principles and advantages of those embodiments. FIG. 1 is a schematic of a mobile automation system. FIG. 2 shows a mobile automation device in the system of FIG. 1. FIG. 3 is a block diagram of certain internal components of the mobile automation device in the system of FIG. 1. FIG. 4 is a flowchart of a method for detecting product views in the system of FIG. 1. FIG. 5 is a diagram illustrating input data for the method of FIG. 4. FIG. 6 is a diagram illustrating a depth map generated at block 415 of the method of FIG. 4. FIG. 7 is a diagram illustrating candidate face edge detection at block 415 of the method of FIG. 4. FIG. 8 is a diagram illustrating candidate face edge detection at blocks 425 and 430 of the method of FIG. 4.
FIG. 9 is a diagram illustrating the combination of first and second sets of candidate face edges at block 435 of the method of FIG. 4.
FIG. 10 is a diagram illustrating the generation and validation of candidate view boundaries at blocks 440 to 460 of the method of FIG. 4.
Those of skill in the art will appreciate that elements in the figures are illustrated for simplicity and clarity and are not necessarily drawn to scale. For example, the dimensions of some elements in the figures may be exaggerated with respect to other elements to improve understanding of embodiments of the present invention.
The apparatus and method components, where applicable, are represented by conventional symbols in the drawings, showing only those specific details that are relevant to an understanding of the embodiments of the present invention so as not to obscure the disclosure with details required for the purposes of the present invention. those skilled in the art will appreciate that they have the advantage of the given description.
DETAILED DESCRIPTION Examples disclosed herein are directed to a method by an image processing controller for detecting product views from recorded depth and image data, the method comprising: obtaining, at the image processing controller, (1) depth measurements from at least one depth sensor, the depth measurements having a display support structure that supports a plurality of product views, (1) image data from at least one image sensor, the image data representing the support structure, and (iii) a set of region of interest (ROT) indicators, where each ROI indicator has a position denotes a plurality of the product views; generating, by a depth detector of the image processing control, a first set of candidate viewing edges from the depth measurements; generating, by an image detector of the image processing control, a second set of candidate viewing edges from the image data; generating by a boundary generator the image processing control: a third set of candidate view edges by combining the first and second sets, and a candidate view boundary for each adjacent pair of candidate view edges in the third set of candidate view edges; selecting, by the boundary generator of the image processing control, a subset of output view boundaries of the candidate view boundaries based on the ROI indicators; displaying, by the boundary generator of the image processing control, product views including the selected subset of output view boundaries.
Additional examples disclosed herein are directed to a computing device comprising: a memory configured to store: (1) depth measurements from at least one depth sensor, the depth measurements representing a support structure that supports a plurality of product views, (1) image data of at least one image sensor, the image data representing the support structure, and (iii) a set of region of interest (ROT) indicators, each ROI indicator indicating a position of a plurality of the product views; an image processing controller comprising: a depth detector configured to: obtain the depth measurements and the ROII indicators; and generating, from the depth measurements, a first set of candidate viewing edges; an image detector configured to: obtain image data and the ROI indicators, and generate, from the image data, a second set of candidate view edges, and a boundary generator configured to generate: a third set of candidate view edges by combining the first and second sets, a candidate view boundary for each pair of adjacent candidate face edges in the third set of candidate face edges; selecting a subset of output view boundaries from the candidate view boundaries based on the ROI indicators and; displaying product views including the selected subset of output view boundaries.
Additional examples disclosed herein are directed to a non-transient computer readable medium containing instructions executable by an image processing controller for arranging an image processing controller to: obtain (1) depth measurements from at least one depth sensor, the depth measurements representing a support structure representing a supports a plurality of product views, (1) image data from at least one image sensor, the image data representing the support structure, and (iii) a set of region of interest (ROT) indicators, each ROI indicator indicating a position of a multiple to the product views; generating, at a depth detector of the image processing control, a first set of candidate viewing edges from the depth measurements; generating, at an image detector of the image processing control, a second set of candidate view edges from the image data; generating, at a boundary generator of the image processing control, a third set of candidate view edges by combining the first and second sets, generating, at the boundary generator of the image processing control, a candidate view boundary for each pair of adjacent candidate view edges in the third set of candidate face edges; selecting, at the boundary generator of the image processing control, a subset of output view boundaries of the candidate view boundaries based on the ROI indicators and; displaying, by the boundary generator of the image processing control, product view detection output including the selected subset of output view boundaries.
FIG. 1 shows a mobile automation system 100 according to the teachings of this disclosure. The system 100 includes a server 101 in communication with at least one mobile automation device 103 (also referred to herein simply as device 103) and at least one client computing device 104 via communication links 105, illustrated in the present example as including wireless links. In the present example, the connections 105 are provided by a wireless local area network (WLAN) deployed through one or more access points (not shown). In other examples, the server 101, the client device 104, or both, are located remotely (i.e., outside the environment in which the device 103 is deployed), and the connections 105 therefor include large area networks such as the Internet, mobile networks, and the like. The system 100 in the present example also includes a dock 106 for the device 103. The dock 106 is in communication with the server 101 via a link 107 which in the present example is a wired link. In other examples, however, the link 107 is a wireless link.
The client computing device 104 is illustrated in FIG. 1 as a mobile computing device, such as a tablet, smartphone or the like. In other examples, the computing device 104 is implemented as a different type of computing device, such as a desktop, a laptop, another server, a kiosk, a monitor, and the like. The system 100 may include a plurality of client devices 104 in communication with the server 101 through respective connections 105.
In the illustrated example, the system 100 is deployed in a retail facility comprising a plurality of support structures such as racks 110-1, 110-2, 110-3 and so on (collectively referred to as racks 110 or shelves 110, and generally referred to as a rack. 110 or a shelf 110 - this naming convention is also applied to other elements discussed herein). Each rack 110 supports a plurality of products 112. Each rack 110 includes a rack back wall 116-1, 116-2, 116-3 and a support surface (eg, support surfaces 117-3 as illustrated in Fig. 1) extending from the rack back wall. 116 to a shelf edge 118-1, 118-2, 118-3.
The racks 110 (also referred to as sub-regions of the facility) are typically arranged as a plurality of aisles (also referred to as regions of the facility), with each aisle comprising a plurality of racks 110 placed in sequence. In such arrangements, the shelf edges 118 face the aisles along which the customers of the retail facility, as well as the device 103, can move. As will be apparent from FIG. 1, the term "shelf edge" 118 as used herein, and which may also be referred to as the edge of a support surface (e.g., support surfaces 117), refers to a surface defined by adjacent surfaces of different inclination angles. In the example illustrated in Fig. 1, the shelf edge 118-3 is at an angle of about ninety degrees to the support surface 117-3 and to the underside (not shown) of the support surface 117-3. In other examples, the angles between shelf edge 118-3 and adjacent surfaces, such as support surface 117-3, are greater or less than ninety degrees.
The device 103 is equipped with a plurality of navigation and data capture sensors 108, such as image sensors (e.g., one or more digital cameras) and depth sensors (e.g., one or more Light Detection and Ranging "- aka LIDAR - sensors, one or more depth cameras that use making structured light patterns, such as infrared light, and the like). The device 103 is deployed in the retail facility and, via communication with the server 101 and using the sensors 108, navigates autonomously or partially autonomously along a length 119 of at least a portion of the shelves 110.
As the device 103 navigates between the shelves 110, it can capture images, depth measurements, and the like that represent the shelves 110 (commonly referred to as shelf data or recorded data). Navigation can be performed in accordance with a frame of reference 102 located in the retail facility. To this end, the device 103 situates its pose (i.e. location and orientation) in the frame of reference
102.
The server 101 comprises a specialized controller, such as a processor 120, specifically adapted to control and / or assist the mobile automation device 103 for navigating the environment and for recording data. The processor 120 is also specifically configured, as will be described in detail herein, to process image data and depth measurements captured by the device 103, with the image data and depth measurements representing the racks 110, for detecting product views on the racks 110. Those skilled in the art will appreciate that a product view is a single instance of a product facing the channel. Thus, if a support surface 117 carries three identical adjacent products, then the products represent three separate product views. The resulting selected product views may be provided to a mechanism for detecting a product status (the mechanism may also be implemented by the processor 120 itself).
The processor 120 is interconnected with a non-temporary computer readable medium, such as a memory 112. The memory 112 includes a combination of volatile memory (e.g., working memory (RAM)),
and non-volatile memory (e.g., read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory). Processor 120 and memory 112 each include one or more integrated circuits. In some embodiments, the processor 120 is configured as one or more central processing units (CPUs) and one or more graphics processing units (GPUs).
The memory 112 stores computer-readable instructions for performing various functionalities, including controlling the device 103 to navigate the racks 110 and record shelf data, as well as post-process the shelf data. The execution of the foregoing instructions by processor 120 configures server 101 to perform various operations described herein. The applications stored in the memory 122 include an artifact detection application 124 (also referred to simply as application 124). Typically, the processor 120 performs various actions, through execution of the application 124 or sub-components thereof in conjunction with other components of the server 101, to detect, in the image data and depth measurements that the racks 110 represent (e.g., data recorded by the device. 103), of individual product views, which are later processed to detect product status information (e.g. whether products are out of stock, misplaced or the like).
Certain exemplary components of the application 124 are shown in FIG. 1, comprising an image detector 126 and a depth detector 128. The image and depth detectors 126 and 128 respectively detect product edges in images and depth measurements, such as those captured by the device 103. The application 124 also includes a boundary generator 130 that receives the detected edges from the detectors 126 and 128, and generates product view boundaries from the edges. In other embodiments, application 124 may be implemented as a bundle of logically separate applications, each implementing an appropriate portion of the functionality described below. For example, detectors 126 and 128, as well as boundary generator 130, may be implemented as separate applications.
The memory 122 may also store data for use in the aforementioned control of the device 103, such as a repository 132 containing a map of the retail environment and other appropriate data (e.g., operational limitations for use in controlling the device 103, data recorded by the device 103, and the like).
Processor 120, as configured through the execution of application 124, is also referred to herein as image processing control 120, or simply control 120. As will now be appreciated, some or all of the functionalities implemented by the image processing driver 120 and described below may also be performed by pre-configured, specialized hardware controllers (e.g., one or more logic circuit arrays configured specifically to optimize image processing speed, e.g., through field programmable gate arrays. (FPGAs) and / or application specific integrated circuits (ASICs) configured for this purpose) rather than by execution of the application 124 by the processor 120.
The server 101 also includes a communications interface 134 connected to the processor 120. The communications interface 134 includes suitable hardware (e.g., transmitters, receivers, network interface controllers, and the like) to allow the server 101 to communicate with other computing devices - particularly the device 103, the client device 104 and the dock 106 - via the links 105 and 107. The links 105 and 107 may be direct links, or links that cross one or more networks, including both local and wide area networks. The specific components of the communication interface 134 are selected based on the type of network or other connections over which the server 104 is required to communicate. In the present example, as previously mentioned, a wireless local area network is implemented in the retail facility through the deployment of one or more wireless access points. To this end, the connections 105 comprise either or both wireless connections between the device 103 and the client device 104 and the aforementioned access points, and a wired connection (e.g. an Ethernet-based connection) between the server 101 and the access point.
The processor 120 can thus obtain data recorded by the device 103 through the communication interface 134 for storage (e.g., repository 132) and subsequent processing (e.g., to detect product views, as noted above). The server 101 may also send status notifications (e.g., notifications to indicate that a product is out of stock, low in stock, or misplaced) to the client device 104 in response to the determination of product status data. The client device 104 includes one or more controllers (e.g., central processing units (CPUs) and / or field programmable gate arrays (FPGAs) configured to process (e.g. display) notifications received from the server 101.
FIG. 2 shows the mobile automation device 103 in more detail. The device 103 includes a chassis 201 with a propulsion assembly 203 (e.g., one or more electric motors, drive wheels, tracks, or the like). The apparatus 103 further includes a sensor mast 105 supported on the chassis 201 and, in the present example, extending upward (e.g., substantially vertically) from the chassis 201. The mast 205 supports the previously mentioned sensors 108. The sensors include, in the present example, in particular, at least one image sensor 207, such as a digital camera. In the present example, the mast 205 supports seven digital cameras 207-1 to 207-7 facing the racks 110. The mast 205 also supports at least one depth sensor 209, such as a 3D digital camera capable of capturing both depth data and image data. The device 103 also includes additional depth sensors, such as LIDAR sensors 211. In the present example, the mast 205 supports two LIDAR sensors 211-1 and 211-2. In other examples, the mast 205 can support additional LIDAR sensors 211 (e.g. four LIDARs 211). As shown in Fig. 2, the cameras 207 and the LIDAR sensors 211 are arranged on one side of the mast 205, while the depth sensor 209 is arranged on a front of the mast 205. In other words, the depth sensor 209 is directed forward (Le. fixed in the direction of movement of the device 103), while the cameras 207 and LIDAR sensors 211 are facing sideways (Le. record data along the device 103, in a direction perpendicular to the direction of movement). in other examples, the device 103 includes additional sensors, such as one or more RFID readers, temperature sensors, and the like.
The mast 205 also supports a plurality of lighting assemblies 213, here lighting assemblies 213-1 to 213-7, arranged to illuminate the fields of view of the respective cameras 207. That is, lighting assembly 213-1 illuminates the field of view of the camera 207 -1, and so on. The cameras 207 and LIDAR's 211 are oriented on the mast 205 such that the fields of view from the sensors are each directed to a rack 110 along whose length 119 the device 103 is moving. As previously mentioned, the device 103 is configured to situate a pose of the device 103 (e.g., a location and orientation of the center of the chassis 201) in the frame of reference 102, whereby the data captured by the device 103 in accordance with the frame of reference 102 can be registered for subsequent processing.
Referring to Fig. 3, certain components of the mobile automation device 103 are shown, in addition to the above-mentioned cameras 207, depth sensor 209, LIDARs 211, and lighting assemblies 213. The device 103 includes a specialized controller, such as a processor 300, connected to a non-existent controller. temporary computer readable storage medium, such as a memory 304. The memory 304 includes a suitable combination of volatile memory (e.g.
working memory (RAM)), and non-volatile memory (e.g. read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory). Processor 300 and memory 112 each include one or more integrated circuits. Memory 304 stores computer readable instructions for execution by the processor
300. In particular, memory 304 stores a device control application 308 which, when executed by processor 300, configures processor 300 to perform various functions related to navigating the facility and controlling sensors 108 to record data. data, e.g., in response to instructions from the server 101. Those of skill in the art will appreciate that the functionality implemented by the processor 300 through the execution of the application 308 may, in other embodiments, also be implemented by one or more specifically developed hardware and firmware components, such as FPGAs, ASICs and the like.
The memory 304 also optionally includes a repository 312 containing, for example, a map of the environment in which the device 103 operates, for use during the execution of the application 308. The device 103 also includes a communication interface 316 so that the device 103 can be used. communicate with the server 101 (e.g., through the link 105 or through the dock 106 and the link 107), for example, to receive instructions for navigating to specific locations and initiating data capture operations.
In addition to the aforementioned sensors, the device 103 includes a motion sensor 318, such as one or more odometers coupled to the propulsion assembly 203. The motion sensor 318 may also, in addition to or instead of the aforementioned odometer (s), have an inertial measurement unit (inertia). measurement unit (IMU)) configured to measure acceleration along a plurality of axes.
The operations performed by the server 101, and in particular by processor 120 as configured by executing the application 124, to detect product views from captured data representing the racks 110 (e.g., images and depth measurements captured by the device 103) are now discussed in more detail with respect to Figs. 4. FIG. 4 illustrates a method 400 for detecting product views. The method 400 will be described in conjunction with its implementation in the system 100, and in particular by the server 101, with respect to the components illustrated in FIG. 1. As will be appreciated below, in other examples, some or all of the processing described below as performed by the server 101 may alternatively be performed by the device 103.
The actions performed by the server 101 during the execution of the method 400 implement three stages of processing. The first and second stages can be performed in any order relative to each other (including simultaneously). In the first stage, the server 101 (and more specifically, the depth sensor 128) detects candidate product view edges from the depth measurements, such as those captured by the depth camera 209 or the LIDARs 211. In the second stage, the server 101 (and more specifically, the image detector 126) detects candidate product view edges from the image data, such as the images captured by the cameras 207. In the third phase, which follows the completion of the first and second phases, the server 101 (and more specifically the boundary generator 130) generates product view boundaries by combining and processing of the candidate face edges of the first and second phases.
At block 405, corresponding to the above-mentioned first stage, the server obtains a plurality of depth measurements (which may also be referred to as a point cloud), for example, by retrieving depth measurements from memory 122. The depth measurements may have been previously recorded by device 103 and provided to the server 101, e.g., via the links 105 and / or 107. The depth measurements are assumed, in the present example, to be defined by coordinates in the frame of reference 102. However, in other examples, the depth measurements may be obtained in the format in which they were recorded, prior to registration to the common frame of reference 102. Furthermore, as mentioned above, the device 103 includes a plurality of depth sensors (e.g., the two LIDAR's 211 shown in Fig. 2). The process described below can be repeated independently for each set of depth measurements according to a given depth sensor, and the results recorded to the frame of reference 102 only at a later point.
The server 101 also retrieves, at block 405, a set of region of interest (ROI) indicators, each ROI indicator indicating a position of a plurality of the product views. The ROI indicators may also be referred to as product regions, as the server 101 can generate ROI indicators by performing appropriate region detection operations on images from the racks 110. The ROI indicators are typically bounding boxes defined in the frame of reference 102.
At block 410, corresponding to the above-mentioned second stage, the server 101 obtains image data in the form of one or more images from the racks 110, e.g. captured by the device 103. As with the above-mentioned depth measurements, the images need to be this point is not to be registered in the frame of reference 102, but for clarity, the images in the discussion below are considered to be registered in the frame of reference. The server 101 also retrieves the above-mentioned ROI indicators at block 410, since the ROI indicators are deployed in both the first and second stages of the method 400.
FIG. 5 illustrates examples of input data obtained by the server 101 at blocks 405 and 410. In particular, FIG. 5, a point cloud 500 comprising a plurality of depth measurements representing a stretch 110 that supports a plurality of various products 112. In particular, point cloud 500 shows two views of a first product 1124, two views of a second product 112b, and three views of a third product 112c.
FIG. 5 also illustrates an image 504 of the rack 110 displayed by the point cloud 500. The image may, for example, be an RGB image captured by the device 103 approximately simultaneously with the capture of the point cloud 500. The image 504 also shows (however, in two dimensions instead of three dimensions) the shelf edges 118, stretch back wall 116 and products 1124, 112b and 112c as the point cloud 500. Finally, FIG. 5 a set of ROI indicators 5084, 508b, and 506c indicating the positions of product regions detected by, for example, the server 101 or other computing device. Detection of the ROI indicators can be performed based on any suitable combination of depth measurements and images, including the point cloud 500 and the image 504. The ROI indicators are displayed in two dimensions, but can also be defined in three dimensions according to the frame of reference 102. As is clear from FIG. 5, come the ROI-
indicators 508 correspond to the respective positions of products 1124, 112b and 112c but do not distinguish between the individual product views shown in the point cloud 500 and the image 504. The spaces between the products on the shelves 110 can often be small enough such that the Detecting individual product views, in the absence of the processing techniques described below, is imprecise or computationally too burdensome.
Returning to Fig. 4, the first phase of the method 400 (depth-based detection of the candidate viewing edges, as performed by the depth detector 128) will be described prior to a description of the second phase (image-based detection of candidate viewing edges, as performed by the image detector 126). However, it will be understood that the order of the first and second phases can be changed without further ado, and that the first and second phases can also be performed simultaneously.
At block 415, the server 101 generates a depth map from the point cloud 500 obtained at block 405. The depth map is a two dimensional representation of the point cloud 500, with each corridor-facing depth measurement (i.e. not occluded from the corridor by another depth measurement). ) in the point cloud, an intensity is assigned based on its depth (ie, its dimension along the Y axis of the frame of reference 102). FIG. 6 illustrates the point cloud 500 along with the frame of reference 102, as well as a two-dimensional depth map 600 generated from the point cloud 500 at block 415. Intensity values are assigned to the depth map 600 such that points of a lower depth (i.e. being closer to the corridor and further away from the rack back wall 116) are darker, while points of greater depth are lighter. Different intensity scales than those illustrated in Fig. 6 can also be applied (eg points at greater depths may be lighter, while points at shallower depths may be darker).
At block 415, the server 101 also detects edges in the depth map
600. Edge detection at block 415 can be performed by applying any suitable edge detection operation, or combination of edge detection operations, to the depth map 600. In the present example, the server 101 performs edge detection at block 415 by applying a Sobel kernel to the depth map 600, followed by a Hough transform. As will be appreciated by those of skill in the art, the Sobel transform highlights the areas of the depth map 600 with strong gradients (which are therefore likely to represent an edge). In complex environments such as those found in a retail environment, the results of the Sobel kernel may contain noise. The Hough transform is used to this end to detect continuous lines in the potentially unclear edges highlighted by the Sobel kernel. The Hough transform, in the present embodiment, is configured to detect vertical lines only (or lines with an orientation within a certain limit from vertical, e.g. 5 degrees), because the edges of the product views are considered to be vertical. to be. FIG. 7 illustrates an exemplary execution of the edge detection at block 415 via applying a Sobel kernel and a Hough transform to the depth map 600. The output of the Sobel kernel is specifically shown in a processed depth map 700 (which is also a edge-weighted depth map). As is evident from the processed depth map 700, certain edges of the products 112 are highlighted with high intensities (i.e., are lighter), while areas of the depth map 600 that do not match edges have low intensities (i.e., are darker). FIG. 7 Also illustrates a set 704 of candidate face edges derived from the processed depth map 700 using a Hough transform or any other suitable line detect operation. The set
In particular, 704 includes candidate facing edges 708a corresponding to products 1124, candidate facing edges 708b corresponding to products 112b, and candidate facing edges 708c corresponding to products 112c. The candidate viewing edges 708a and 708b as shown in Figs. 7 are incomplete, however. More specifically, the inner edges of the products (where each product 112 is placed relatively close to another product 112) were not detected. Furthermore, the set 704 includes an edge 712 that does not represent any product 112. The rim 712 may be an artifact, corresponding to one side of the rack 110, or the like. Additionally, candidate face edges 708b and 708c include separate observations of certain adjacent edges, such as pair of edges 716.
Returning to Fig. 4, at block 420, the server 101 groups the candidate edges detected at block 415, and filters the candidate edges based on the ROI indicators 508. For example, for each pair of candidate edges, the server 101 may determine whether the distance between the edges is below a adjustable limit value. When the determination is affirmative, the server 101 converts the pair of edges to a single edge, e.g., with a position generated from the average of the pair. Thus, referring back to FIG. 7, a final set 720 of candidate edges is shown in which the double edge observations including the pair 716 are grouped into single edges.
Additionally, the ROI indicators 508 were overlaid on the set 720 of candidate edges. At block 420, the server 101 evaluates whether any candidate edge of the set 704 of candidate edges is not within a ROI indicator 508. Any edge that is not within a ROI indicator 508 is removed, as the ROI indicators 508 indicate the presence of a product 112 and any edge that is outside the ROI indicators 508 is thus not a product-related edge. The rim 712 has therefore been removed from the set 720.
Following the completion of block 420, depth detector 128 passes the final set 720 of candidate first stage view edges to boundary generator 130 for further processing. However, before discussing the boundary generator 130 in further detail, the second phase of the method 400 will be discussed below.
Returning to Fig. 4, the second phase begins at block 410 as previously described. After obtaining the image 504 and the ROI indicators 508 at block 410, the server 101 at block 425 is configured to select a plurality of windows from the image 504 based on the ROI indicators 508. The server 101 classifies then each of the selected windows as one containing a product edge or not containing a product edge.
FIG. 8 illustrates the process of selecting and ranking the window at block 425. In particular, the server 101 selects regions 8004, 880b and 800c of the image 504 within the ROI indicators 508. In other words, regions of the image 504 outside the ROI indicators 508 are not processed at block 425. After selecting the regions 800, the server 101 divides each region 800 into a plurality of windows 804a, 804b, 804c. Each window has a height equal to the height of the corresponding region 800, and a width equal to an adjustable window width (e.g., stored in memory 122). The windows 804 overlap, as shown in Fig. 8, such that any part of a given region 800 appears in multiple windows.
Each window 804 is classified as either containing or not containing a product view edge by providing the window 804 to a classifier, such as a pre-trained convolution neural network (CNN) or any other suitable classifier. As will be appreciated, the classifier has been previously trained based on a training set of images, e.g., with positive and negative marked examples of windows containing (or not containing) product view edges. The classifier returns, for each window, a binary indication as to whether or not the window contains a product view border.
As will now be appreciated, overlapping the windows 804 results in each product view edge being detected in a plurality of windows. FIG. 8 illustrates three windows 804a-1, 804a-2 and 804a-3 each containing the border 808 between products 1124. To this end, at block 430, the server 101 groups groups of overlapping windows 804 that have been classified as one containing an edge, assuming that all of the windows from a group of overlapping windows 804 classified as containing an edge contain the same edge.
In particular, as shown in Fig. 8, the server 101 selects a single position for the edge 808, for example by generating an intensity profile 812 corresponding to the portion of the image 504 that is within the combined area of windows 804a-1, 804a-2 and 804a-3. is located, as a function of a position along the X axis of the frame of reference 102. The server 101 then selects, as the position for the edge, the position of the minimum intensity 816 in the intensity profile 812 (ie, the darkest point in the intensity profile).
The above process is repeated for each of the regions 800, and the server 101 generates a final set 820 of candidate view edges. As shown in Fig. 8, the set 820 includes the edges that were not successfully detected through the depth-based phase discussed above, but the set 820 does not include the edges corresponding to the rightmost sides of the products 112b and 112c, which appear in regions 824 and 828 should occur.
Following the completion of block 430, the image detector 126 passes the final set 820 of candidate first stage viewing edges to the boundary generator 130 for further processing. Referring back to Fig. 4, at block 435, the boundary generator 130 combines the final sets 720 and
820 of candidate viewing edges of the detectors 126 and 128 (say, of the first and second phases of the method 400 as set forth above). The boundary generator 130 also groups candidate face edges, for example, by combining each pair of edges that are within a certain boundary distance from each other into a single edge.
FIG. 9 shows the final first and second sets 720 and 820 of candidate face edges (with the set 720 shown in broken lines to distinguish the edges of the set 820), along with a combined set 900 comprising all the edges of both sets 720 and
820. Also shown in FIG. 9 is a combined grouped set 920 resulting from the grouping of edges as discussed above.
Returning to Fig. 4, at block 440, the server 101 generates candidate view boundaries. In the present embodiment, the candidate view boundaries are bounding boxes generated by extending horizontal segments between the top and bottom ends of each adjacent pair of candidate view edges. That is, the adjacent pair of elevation edges form the sides of a candidate boundary, while the aforementioned horizontal segments form the top and bottom of the candidate boundary.
FIG. 10 shows the combined grouped set 920 of candidate edges along with the aforementioned horizontal segments. In other words, the server 101 has generated eight candidate view boundaries 1000-1, 1000-2, 1000-3, 1000-4, 1000-5, 1000-6, 1000-7 and 1000-8.
However, as will be apparent from a comparison of Figs. 10 with FIG. 5, certain candidate view boundaries do not actually match current products 112. To remove such false positives, server 101 determines at block 445 whether each candidate view boundary 1000 is within a ROI indicator 508. The candidate view boundary 100 need not be completely within a ROI indicator 508. For example, the server 101 may apply an overlap limit at block 445 such that determining at block 445 is affirmative when at least 90% of the candidate view boundary is within a ROI indicator 508. A wide variety of other limit values can also be used at block 445. When the determination at block 445 is negative, the candidate view boundary 1000 is removed at block 450. However, if the determination at block 445 is affirmative, indicating that the candidate view boundary 1000 is valuable (i.e., representing an actual product 112), the view boundary candidate is retained at block 455. At block 460, the server 101 completes the detection of product views by running the validated candidate view boundaries 1000 (i.e., those for which the determination at block 445 was affirmative ). For example, the server may execute block 460 by permanently storing the validated candidate view boundaries 100 in repository 132. Each candidate view boundary 1000 is stored as a bounding box defined in frame of reference 102. FIG. 10 shows a final set of 1020 output view boundaries, excluding the boundaries 1000-3 and 1000-7, which are not within any of the ROI indicators 508.
At block 460, the boundary generator 130 may also provide the validated candidate view boundaries 1000 to other downstream processes executed by the server 101 or other computing devices. Examples of such downstream processes include gap detectors (for detecting gaps between views that can indicate that a product is out of stock) and the like. The boundary generator 130 may also drive a display connected to the server 101 to display the output set of view boundaries. In other words, the boundary generator completes detecting product views by displaying the output of detecting product views, the output including the validated candidate view boundaries 1000.
In the foregoing description, specific embodiments have been described. However, those skilled in the art will recognize that various modifications and changes can be made without departing from the scope of the invention as set out in the claims below. Therefore, the description and figures are to be understood by way of illustration rather than limitation, and all such modifications are intended to be included within the scope of the invention of the present description. For clarity and brief description, features are described herein as part of the same or separate embodiments, but it is to be understood that the scope of the invention may include embodiments having combinations of all or some of the features described. It will be understood that the embodiments shown have the same or similar components, except where they are described as being different.
The benefits, solutions to problems, and any element (s) that could cause any benefit or solution to occur or become more apparent should not be construed as critical, mandatory or essential features or elements of any or all of the conclusions. The invention is defined solely by the appended claims, including any modifications made during the course of this application and all equivalents to those claims as published.
In addition, relational terms such as first and second, top and bottom, and the like, may be used herein only to distinguish an entity or action from another entity or action without necessarily requiring or requiring an actual relationship or sequence between such entities or actions. imply. The terms “comprise”, “comprising”, “has”, “having”, “contains”, “containing” or any variation thereof are intended to cover a non-exclusive inclusion, so that a process, method, article, or assembly that a list includes, has, contains not only those elements but may also contain other elements not explicitly mentioned or inherent in such a process, method, item, or assembly. An element preceded by "includes ... a", "has ... a", "contains ... a" does not exclude, without further restrictions, the existence of additional identical elements in the process, method, article or arrangement ut which includes, has or contains the element. The term “one” is defined as one or more unless explicitly stated otherwise. The terms "substantially", "essential", "near", "approximately" or any other version thereof are defined as close to what is understood by those of skill in the art, and in a non-limiting embodiment, the term is defined as being within 10% , in another embodiment within 5%, in another embodiment within 1%, and in another embodiment within
0.5%. The term "linked" is defined herein as linked, but not necessarily directly and not necessarily mechanically. A device or structure that has been “configured” in some way is configured in at least that way, but can also be configured in ways not described.
It will be appreciated that some embodiments may be contained in one or more generic or specialized processors (or "processing devices") such as microprocessors, digital signal processors, custom processors and field programmable gate arrays (FPGAs) and unique stored program instructions (including both software and firmware) which direct the one or more processors to implement, in conjunction with certain non-processor circuitry, some, most, or all of the functions of the method and / or arrangement described herein. Alternatively, some or all of the functions can be implemented by a state machine that does not contain any stored program instructions, or in one or more application specific integrated circuits (ASICs), in which any function or some combinations of certain functions are implemented as custom logic. Of course, a combination of the two approaches could be used.
In addition, one embodiment can be implemented as a computer-readable storage medium with computer-readable code stored thereon for programming a computer (e.g., comprising a processor) to perform a method as described herein and for which rights are pending. Examples of such computer readable storage media include, but are not limited to, a hard disk, a CD-ROM, an optical storage device, a magnetic storage device, a ROM (read only memory), a PROM (programmable read only memory), an EPROM (erasable programmable read-only memory), an EEPROM (electrically erasable programmable read-only memory), and a flash memory. Furthermore, it is expected that, notwithstanding potentially significant efforts and many design choices motivated by, for example, available time, current technology and economic considerations, when guided by the concepts and principles described herein, those skilled in the art will be readily able to use such software instructions and software. generate programs and ICs with minimal experimentation.
The summary of the disclosure is provided to give the reader a quick impression of the nature of the technical description. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. In addition, in the foregoing "detailed description", it can be seen that different features are grouped together in different embodiments to streamline the description.
This manner of description cannot be interpreted as reflecting an intention that the claimed embodiments require more features than those expressly set forth in each claim.
Rather, as the following claims reflect, there is inventive matter in less than all of the features of a single described embodiment.
Thus, the following claims are included in the "detailed description", with each claim standing alone as subject matter individually for which rights are sought.
The mere fact that certain measures are defined in mutually different claims does not indicate that a combination of these measures cannot be used to an advantage.
A multitude of variants will be apparent to those skilled in the art.
All variants are understood to fall within the scope of the invention which is defined in the following claims.

权利要求:
Claims (19)
[1]
A method by an image processing controller for detecting product views from recorded depth and image data, the method comprising: obtaining, at the image processing controller, (1) depth measurements from at least one depth sensor, the depth measurements representing a support structure comprising a plurality of product views, (1) image data from at least one image sensor, the image data representing the support structure, and (iii) a set of region of interest (ROT) indicators, each ROI indicator indicating a position of a multiple of the product views; generating, by a depth detector of the image processing control, a first set of candidate viewing edges from the depth measurements; generating, by an image detector of the image processing control, a second set of candidate viewing edges from the image data; generating by a boundary generator of the image processing control: © a third set of candidate view edges by combining the first and second sets, and (1) a candidate view boundary for each adjacent pair of candidate view edges in the third set of candidate view edges; selecting, by the boundary generator of the image processing control, a subset of output view boundaries of the candidate view boundaries based on the ROI indicators and;
detecting, by the boundary generator of the image processing control, product views including the selected subset of output view boundaries.
[2]
The method of claim 1, wherein generating the first set of candidate viewing edges comprises: generating a two-dimensional depth map from the depth measurements; and detecting edges in the depth map.
[3]
The method of claim 2, wherein detecting edges in the depth map comprises: applying an edge detect operation to the depth map to generate an edge weighted depth map, and applying a line detect operation to the edge weighted depth map.
[4]
The method of any preceding claim, further comprising removing candidate edges from the first set that are not within a ROI indicator.
[5]
The method of any preceding claim, wherein generating the second set of candidate viewing edges comprises: selecting a plurality of windows from the image data, and classifying each window as one containing an edge and not containing an edge.
[6]
The method of claim 5, further comprising: determining a position of each candidate viewing edge of the second set based on an intensity profile of a corresponding one of the windows.
[7]
The method of any preceding claim, wherein combining the first and second sets of candidate face edges comprises:
determining, for each pair of the adjacent candidate face edges, whether a distance separating the pair is below a boundary value; replacing the pair with a single candidate viewing edge when the distance separating the pair is below the limit.
[8]
The method of any one of the preceding claims, wherein generating the candidate view boundaries comprises: bringing together upper ends of the adjacent pair of the third set of candidate view edges with an upper boundary segment; and bringing together lower ends of the adjacent pair of the third set of candidate face edges with a lower boundary segment.
[9]
The method of any preceding claim, wherein selecting the subset of output view boundaries includes removing candidate view boundaries that are outside the ROI indicators.
[10]
A computing device, comprising: a memory configured to store: (1) depth measurements from at least one depth sensor, the depth measurements representing a support structure that supports a plurality of product views, (1) image data from at least one image sensor, the depth measurements image data represents the support structure, and (iii) a set of region of interest (ROT) indicators, each ROI indicator indicating a position of a plurality of the product views; an image processing controller comprising: a depth detector configured to: obtain the depth measurements and the ROI indicators; and generating, from the depth measurements, a first set of candidate viewing edges; an image detector configured to: obtain image data and the ROI indicators, and generate, from the image data, a second set of candidate view edges, and a boundary generator configured to: generate a third set of candidate view edges by combining the first and second sets, generating a candidate view boundary for each pair of adjacent candidate face edges in the third set of candidate face edges; selecting a subset of output view boundaries from the candidate view boundaries based on the ROI indicators and; detecting product views including the selected subset of output view boundaries.
[11]
The computing device of claim 10, wherein the depth detector is configured to, for generating the first set of candidate viewing edges: generate a two-dimensional depth map from the depth measurements; and detecting edges in the depth map.
[12]
The computing device of claim 11, wherein the depth detector is configured to, for detecting edges in the depth map: apply an edge detect operation to the depth map to generate an edge weighted depth map, and apply a line detect operation to the edge weighted depth map.
[13]
The computing device of any of claims 10 to 12, wherein the depth detector is further configured to remove candidate edges from the first set that are not within a ROI indicator.
[14]
The computing device of any of claims 10 to 13, wherein the image detector is configured to, for generating the second set of candidate viewing edges: select a plurality of windows from the image data, and classify each window as one containing an edge and not containing an edge.
[15]
The computer device of claim 14, wherein the image detector is further configured to determine a position of each candidate viewing edge of the second set based on an intensity profile of a corresponding one of the windows.
[16]
The computing device of any of claims 10 to 15, wherein the boundary generator is arranged to, for combining the first and second sets of candidate face edges: determine, for each pair of adjacent candidate face edges, a distance that the pair separates below a limit value; replacing the pair with a single candidate viewing edge when the distance separating the pair is below the limit.
[17]
The computing device of any one of claims 10 to 16, wherein the boundary generator is configured to, for generating the candidate view boundaries: bring together upper ends of the adjacent pair of the third set of candidate view boundaries with an upper boundary segment; and bringing together lower ends of the adjacent pair of the third set of candidate face edges with a lower boundary segment.
[18]
The computing device of any of claims 10 to 17, wherein the boundary generator is arranged to remove candidate view boundaries outside the ROI indicators to select the subset of resulting view boundaries.
[19]
A non-temporary computer-readable medium containing instructions executable by an image processing controller for arranging an image processing controller for: obtaining (1) depth measurements from at least one depth sensor, the depth measurements representing a support structure that supports a plurality of product views, (11) image data from at least one image sensor, the image data representing the support structure, and (iii) a set of region of interest (ROT) indicators, each ROI indicator indicating a position of a plurality of the product views; generating a first set of candidate viewing edges from the depth measurements; generating a second set of candidate view edges from the image data; generating a third set of candidate face edges by combining the first and second sets, generating a candidate view boundary for each pair of adjacent candidate face edges in the third set of candidate face edges; selecting a subset of output view boundaries from the candidate view boundaries based on the ROI indicators and; detecting product views including the selected subset of output view boundaries.

类似技术:

公开号 | 公开日 | 专利标题

BE1025892A9|2019-08-07|METHOD FOR SCREEN EDGE DETECTION

JP2015210820A|2015-11-24|Video tracking based method for automatic sequencing of vehicles in drive-thru applications

JP6549558B2|2019-07-24|Sales registration device, program and sales registration method

BE1026149B1|2020-07-03|METHOD, SYSTEM AND APPARATUS FOR CORRECTING TRANSPARENCY ARTEFACTS IN DATA REPRESENTING A SUPPORT STRUCTURE

US20200082193A1|2020-03-12|Image processing apparatus, display control apparatus, image processing method and recording medium

US20190073775A1|2019-03-07|Multi-sensor object recognition system and method

BE1026161A9|2019-10-29|METHOD, SYSTEM AND DEVICE FOR MOBILE AUTOMATION DEVICE LOCALIZATION

BE1027283B1|2021-02-25|METHOD, SYSTEM AND DEVICE FOR DETECTING PRODUCT VIEWS

US20210272316A1|2021-09-02|Method, System and Apparatus for Object Detection in Point Clouds

US11100303B2|2021-08-24|Method, system and apparatus for auxiliary label detection and association

US11080566B2|2021-08-03|Method, system and apparatus for gap detection in support structures with peg regions

US11107238B2|2021-08-31|Method, system and apparatus for detecting item facings

US20220083959A1|2022-03-17|System and method for detecting products and product labels

JP2022512604A|2022-02-07|Methods, systems and equipment for determining the depth of the support structure

US11145397B1|2021-10-12|System and method for augmented reality detection of loose pharmacy items

US20220058425A1|2022-02-24|System and method for the automatic enrollment of object images into a gallery

WO2020210822A1|2020-10-15|System and method for associating products and product labels

US11200677B2|2021-12-14|Method, system and apparatus for shelf edge detection

US20200182623A1|2020-06-11|Method, system and apparatus for dynamic target feature mapping

CA3119342A1|2022-01-17|Mixed depth object detection

WO2021202472A1|2021-10-07|Method, system and apparatus for data capture illumination control

同族专利:

公开号 | 公开日

US20200380454A1|2020-12-03|

WO2020247068A1|2020-12-10|

BE1027283A1|2020-12-09|

FR3096816A1|2020-12-04|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US20170286901A1|2016-03-29|2017-10-05|Bossa Nova Robotics Ip, Inc.|System and Method for Locating, Identifying and Counting Items|

US20190073559A1|2017-09-07|2019-03-07|Symbol Technologies, Llc|Method and apparatus for shelf edge detection|

US20150310601A1|2014-03-07|2015-10-29|Digimarc Corporation|Methods and arrangements for identifying objects|

US10176452B2|2014-06-13|2019-01-08|Conduent Business Services Llc|Store shelf imaging system and method|

JP6728404B2|2016-05-19|2020-07-22|シムビロボティクス, インコーポレイテッドSimbe Robotics, Inc.|How to track product placement on store shelves|

US10592854B2|2015-12-18|2020-03-17|Ricoh Co., Ltd.|Planogram matching|

WO2018204308A1|2017-05-01|2018-11-08|Symbol Technologies, Llc|Method and apparatus for object status detection|

法律状态:
2021-04-23| FG| Patent granted|Effective date: 20210225 |

优先权:

申请号 | 申请日 | 专利标题

US16/429,820|US20200380454A1|2019-06-03|2019-06-03|Method, System and Apparatus for Detecting Product Facings|

[返回顶部]